{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "abea4469",
   "metadata": {},
   "source": [
    "# Adding Adapter Support to Qwen 2.5 via Plugin Interface\n",
    "\n",
    "This notebook demonstrates how to add adapter support to a custom or not pre-supported model with the _Adapters_ library's **[plugin interface](https://docs.adapterhub.ml/plugin_interface.html)**. Specifically, we'll be writing a plugin interface for the **Qwen 2.5** model and then train an adapter for mathematical reasoning."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9334584",
   "metadata": {},
   "source": [
    "## 1. Installing Required Libraries\n",
    "\n",
    "First, let's install the necessary libraries if you haven't already."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "775dc1e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install the adapters library and transformers\n",
    "!pip install -q adapters transformers datasets trl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43cd4728",
   "metadata": {},
   "source": [
    "## 2. Understanding the Model Architecture\n",
    "\n",
    "Before creating our plugin interface, let's understand the basic structure of Qwen 2.5:\n",
    "\n",
    "- Like most Transformer language models, it consists of an embedding layer followed by a series of decoder layers\n",
    "- Each layer contains a self-attention block and an MLP block\n",
    "- The self-attention block includes query, key, value, and output projections\n",
    "- The MLP block includes multiple linear projections\n",
    "- Qwen applies layer norms *before* the self-attention and MLP blocks\n",
    "\n",
    "To create an adapter interface, we need to map these components to the appropriate adapter hooks."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a760598",
   "metadata": {},
   "source": [
    "## 3. Creating the Plugin Interface\n",
    "\n",
    "Now we'll create a plugin interface for Qwen 2.5 that maps the model's architecture to the adapter framework.\n",
    "\n",
    "‼️ The interface below for Qwen 2 and Qwen 2.5 already comes pre-supported in _Adapters_, so you could skip this section entirely! It's merely to showcase how you could define interfaces for your own custom models!\n",
    "\n",
    "You can find a list of all pre-supported models here: https://docs.adapterhub.ml/model_overview.html."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71f2ec5a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import adapters\n",
    "from adapters import AdapterModelInterface\n",
    "from transformers import AutoModelForCausalLM\n",
    "\n",
    "plugin_interface = AdapterModelInterface(\n",
    "    # Specify which adapter methods to enable\n",
    "    adapter_methods=[\"lora\", \"reft\", \"bottleneck\"],\n",
    "    \n",
    "    # Map the model's components to the adapter interface\n",
    "    model_embeddings=\"embed_tokens\",      # Embedding layer\n",
    "    model_layers=\"layers\",                # Transformer layers\n",
    "    layer_self_attn=\"self_attn\",          # Self-attention module in each layer\n",
    "    layer_cross_attn=None,                # Qwen doesn't have cross-attention\n",
    "    \n",
    "    # Projection matrices within the attention module\n",
    "    attn_k_proj=\"k_proj\",                 # Key projection\n",
    "    attn_q_proj=\"q_proj\",                 # Query projection\n",
    "    attn_v_proj=\"v_proj\",                 # Value projection\n",
    "    attn_o_proj=\"o_proj\",                 # Output projection\n",
    "    \n",
    "    # MLP projections\n",
    "    layer_intermediate_proj=\"mlp.up_proj\",  # Up projection in MLP\n",
    "    layer_output_proj=\"mlp.down_proj\",      # Downward projection in MLP\n",
    "\n",
    "    layer_pre_self_attn=\"input_layernorm\",  # Hook directly before self-attention\n",
    "    layer_pre_ffn=\"post_attention_layernorm\",  # Hook directly before MLP\n",
    "    # Qwen applies layer norms before attention and MLP, so no need to add them here\n",
    "    layer_ln_1=None,\n",
    "    layer_ln_2=None,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4df5be9c",
   "metadata": {},
   "source": [
    "Each parameter in the interface maps to specific module names in the model's architecture, allowing the adapter methods to hook into the right components."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5a555a4",
   "metadata": {},
   "source": [
    "## 4. Loading the Model and Initializing with the Interface\n",
    "\n",
    "Now, let's load the Qwen 2.5 model and initialize it with our plugin interface."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff401e79",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the model\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"Qwen/Qwen2.5-1.5B\",  # Using the 1.5B version\n",
    "    device_map=\"auto\",  # Automatically distribute model across available GPUs\n",
    "    torch_dtype=\"bfloat16\",  # Use half-precision for faster computation\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3332362",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "# Load the tokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-1.5B\")\n",
    "\n",
    "# Set the pad token ID to be different from the model's EOS token\n",
    "tokenizer.pad_token_id = 151645\n",
    "model.config.pad_token_id = tokenizer.pad_token_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79829238",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the adapter framework with our plugin interface\n",
    "# Remove the interface argument to use the default interface\n",
    "adapters.init(model, interface=plugin_interface)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c9b508c",
   "metadata": {},
   "source": [
    "## 5. Adding and Training an Adapter\n",
    "\n",
    "With the interface in place, we can now add an adapter to our model.\n",
    "In this example, we'll train a [bottleneck adapter](https://docs.adapterhub.ml/methods.html#bottleneck-adapters). You can easily switch to [one of the other supported adapter methods](https://docs.adapterhub.ml/overview.html#table-of-adapter-methods) (e.g. LoRA) by changing the `adapter_config`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf62e4c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from adapters import SeqBnConfig, LoRAConfig\n",
    "\n",
    "# Add a LoRA adapter\n",
    "adapter_name = \"qwen-math-adapter\"\n",
    "adapter_config = SeqBnConfig(\n",
    "    reduction_factor=32,  # Bottleneck size\n",
    ")\n",
    "# Alternatively e.g.: \n",
    "# adapter_config = LoRAConfig(\n",
    "#     r=32,  # Rank of the low-rank decomposition\n",
    "#     alpha=16,  # Scaling factor for LoRA\n",
    "# )\n",
    "\n",
    "model.add_adapter(adapter_name, config=adapter_config)\n",
    "\n",
    "# Activate the adapter\n",
    "model.set_active_adapters(adapter_name)\n",
    "\n",
    "# Set the model to train only the adapter parameters\n",
    "model.train_adapter(adapter_name)\n",
    "\n",
    "# Verify adapter was correctly added\n",
    "print(model.adapter_summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0129bb8",
   "metadata": {},
   "source": [
    "## 6. Loading the GSM8K Dataset for Fine-tuning\n",
    "\n",
    "For this example, we'll use the GSM8K dataset to fine-tune our model for solving grade school math problems."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d38488e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "# Load the GSM8K dataset\n",
    "dataset = load_dataset(\"openai/gsm8k\", \"main\")\n",
    "print(dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd33582d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Explore sample data\n",
    "print(\"Sample question:\")\n",
    "print(dataset[\"train\"][0][\"question\"])\n",
    "print(\"\\nSample answer:\")\n",
    "print(dataset[\"train\"][0][\"answer\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd9f7c79",
   "metadata": {},
   "source": [
    "## 7. Preprocessing the Dataset\n",
    "\n",
    "We need to tokenize our math problems and their solutions for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6f8a2a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_function(examples):\n",
    "    # Create full prompts with question and expected answer format\n",
    "    prompts = [\n",
    "        f\"Solve the following math problem step-by-step:\\n\\nQuestion: {q}\\n\\nAnswer: {a} <|endoftext|>\"\n",
    "        for q, a in zip(examples[\"question\"], examples[\"answer\"])\n",
    "    ]\n",
    "    \n",
    "    # Tokenize as regular sequences\n",
    "    tokenized = tokenizer(prompts, padding=\"max_length\", truncation=True, max_length=2048)\n",
    "    \n",
    "    # For causal language modeling, labels are the same as input_ids\n",
    "    tokenized[\"labels\"] = tokenized[\"input_ids\"].copy()\n",
    "    \n",
    "    return tokenized\n",
    "\n",
    "# Apply preprocessing to the dataset\n",
    "tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=[\"question\", \"answer\"])\n",
    "\n",
    "print(\"Dataset processed!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "412dabf9",
   "metadata": {},
   "source": [
    "## 8. Fine-tuning the Adapter\n",
    "\n",
    "Now we can fine-tune our adapter for solving math problems."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30339714",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import TrainingArguments\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "# Set up training arguments\n",
    "training_args = TrainingArguments(\n",
    "    output_dir=\"./qwen-math-adapter\",\n",
    "    per_device_train_batch_size=2,  # Increase or decrease based on GPU memory\n",
    "    per_device_eval_batch_size=2,\n",
    "    learning_rate=1e-4,\n",
    "    num_train_epochs=1,  # More epochs for complex task\n",
    "    save_steps=30,\n",
    "    eval_steps=30,\n",
    "    logging_steps=10,\n",
    "    evaluation_strategy=\"steps\",\n",
    "    load_best_model_at_end=True,\n",
    "    metric_for_best_model=\"loss\",  # Use loss as metric for best model\n",
    "    greater_is_better=False,  # Lower loss is better\n",
    "    push_to_hub=False,\n",
    "    gradient_accumulation_steps=8,  # Accumulate gradients to simulate larger batch sizes\n",
    "    bf16=True,  # Use mixed precision\n",
    "    warmup_ratio=0.1,  # Add some warmup steps\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd58d138",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split dataset into train and validation\n",
    "# Use a bugger/ smaller subset as needed\n",
    "train_dataset = tokenized_dataset[\"train\"].select(range(min(len(tokenized_dataset[\"train\"]), 4000)))\n",
    "eval_dataset = tokenized_dataset[\"test\"].select(range(min(len(tokenized_dataset[\"test\"]), 200)))\n",
    "\n",
    "print(f\"Training on {len(train_dataset)} examples and evaluating on {len(eval_dataset)} examples\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98c50119",
   "metadata": {},
   "outputs": [],
   "source": [
    "from adapters import AdapterTrainer\n",
    "from trl import DataCollatorForCompletionOnlyLM\n",
    "\n",
    "# Initialize the trainer\n",
    "trainer = AdapterTrainer(\n",
    "    model=model,\n",
    "    processing_class=tokenizer,\n",
    "    args=training_args,\n",
    "    data_collator=DataCollatorForCompletionOnlyLM(response_template=\"Answer:\", tokenizer=tokenizer),\n",
    "    train_dataset=train_dataset,\n",
    "    eval_dataset=eval_dataset,\n",
    ")\n",
    "\n",
    "# Train only the adapter parameters\n",
    "trainer.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5f4b99b",
   "metadata": {},
   "source": [
    "## 9. Saving and Loading the Adapter\n",
    "\n",
    "After training, we can save just the adapter weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb0ef8a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save only the adapter weights\n",
    "model.save_adapter(\"./qwen-math-adapter\", adapter_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14290123",
   "metadata": {},
   "source": [
    "## 10. Testing the Adapter\n",
    "\n",
    "Let's test our math problem-solving adapter on some new examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "663b3577",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "generator = pipeline(\n",
    "    \"text-generation\",\n",
    "    model=model,\n",
    "    tokenizer=tokenizer,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e0380a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's test the model with a few math problems\n",
    "test_examples = [\n",
    "    \"Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four eggs. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\",\n",
    "    \"Carlos is planting a lemon tree. The tree will cost $90 to plant. Each year it will grow 7 lemons, which he can sell for $1.5 each. It costs $3 a year to water and feed the tree. How many years will it take before he starts earning money on the lemon tree?\",\n",
    "    \"Two trains leave San Rafael at the same time. They begin traveling westward, both traveling for 80 miles. The next day, they travel northwards, covering 150 miles. What's the distance covered by each train in the two days?\"\n",
    "]\n",
    "\n",
    "# Format the test examples with the prompt template\n",
    "def to_prompt(text):\n",
    "    return f\"Solve the following math problem step-by-step:\\n\\nQuestion: {text}\\n\\nAnswer:\"\n",
    "\n",
    "for example in test_examples:\n",
    "    print(\"=\" * 50)\n",
    "    print(\"Problem:\")\n",
    "    print(example)\n",
    "    prompt = to_prompt(example)\n",
    "    output = generator(prompt, max_new_tokens=500, do_sample=False, return_full_text=False)\n",
    "    print(\"Solution:\")\n",
    "    print(output[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7d474e9",
   "metadata": {},
   "source": [
    "## 11. Conclusion\n",
    "\n",
    "In this notebook, we've demonstrated how to:\n",
    "\n",
    "1. Create a plugin interface for adding adapter support to Qwen 2.5\n",
    "2. Load and initialize the model with the adapter framework\n",
    "3. Add a bottleneck adapter to the model\n",
    "4. Fine-tune the adapter on the GSM8K math problem-solving task\n",
    "5. Save and reload the adapter weights\n",
    "6. Use the adapter for solving new math problems\n",
    "\n",
    "The plugin interface approach allows you to use parameter-efficient fine-tuning with any Transformer model, even those not officially supported by the adapters library."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "adapters",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
